The Road to Greatness: The Do’s and Don’ts of Sharing the Results of your Optimization Efforts

After your optimization program has been running for a while, there will naturally come times when you get to share the amazing success that has been generated by running your program correctly. Often times groups are so happy to get to brag, they don’t realize that they are causing the program long term problems.

There are many keys to sharing results correctly, but the most important thing to remember is why are you sharing them. Yes, you are talking to others about how great you are, but ultimately results are not about the past, but about the future. Should others invest in the program? Should they expand where and what you optimize? What do others think about and interact with optimization and most importantly, why should they listen to you and your team when it comes to how and why you should be running certain tests. The critical moments come not from the sharing of results, but the framing of why and how you get those results, and what it can do for other parts of the organization. Focus on the wrong parts, and you will are asking for far more trouble than most can imagine.

Some of the worst moments for programs come after big results presentations, where they have gotten buy in from others and the massive expansion of the program, but fail to share the right message and to help others understand that testing is often times not what they think it is. These moments inevitably lead to large resource drains, negative impressions, and massive time sinks for the original testing group, leading them to frustration and less total revenue generation then before the influx of resources. Looking back 6 and 12 months later, you can easily see the moment that things went south.

With all of those concerns in mind, here are the keys to successfully sharing results within your organization.

DO – Focus on what you got by proving assumptions wrong

It is so fun to talk about getting an 8% lift, or getting a 20% lift on multiple tests, but often times the lift is secondary to the how did you get it. If you are testing to find the optimal use of resources, then it is inevitable that you will find many times when popular assumptions have been proven wrong. As you share your results, it is important that this is the primary part of the message. It is not about “we got a 15% lift“, it is about “everyone wanted to do X, which would have generated a 3% lift, but we found that of these 5 feasible alternatives, that doing Y actually was dramatically better and generated a 15% lift, or 12% better than we would have gotten if all we did was test what people thought was going to win.”

DON’T – Report test reports as a single revenue number

It is so fun and easy to report results as, “the test generated 6.2 million in additional revenue.” The problem is that there is absolutely no way to know specifics with accuracy, and you will garner a lack of trust if later the P&L does not show that exact figure of gain. I understand how impressive it is to point to a single number and how much it can get you credit, but ultimately it is far more damaging then the temporary good it might generate.

Instead, even if we ignore all the real world problems with confidence, it is important to understand that confidence and most measures only measure the likelihood of pattern, not the actual outcome. If I have a 10% lift and 96% confidence, it is not 96% confident that you will get a 10% lift, only 96% confident that the measured experience will beat control. Confidence intervals can also be tricky because of the many assumptions of the Gaussian bell curves that they are based off of.

Instead focus on report tests as a range, based on a preset range. What that range is somewhat arbitrary, as long as it is sufficiently large enough to convey the massive range of possibility. If I have not done deep analysis of past results, I will often times report test results in a 50% – 200% range, so that the 6.2 million becomes an expected outcome of 3.1 to 12.4 million dollars. Ultimately the range is arbitrary, though there are ways to look back at results over time and see an expected range. Express everything in a large but relevant range and you will avoid all the massive problems of credibility false reporting creates.

DO – Report all tests based solely on revenue impact

While you can’t report an absolute number that does not mean that you should not be reporting the fiscal impact of a test. Translating all tests to a revenue figure gives you the ability to express your efforts as they impact the bottom line, while also giving you the ability to rationally compare the results amongst tests. Even if you are not a retail site, you can translate leads or page-views to average value or CPM. Revenue also serves the purpose of making you evaluate your single success metric to ensure that it is tied to the purpose of your site and are not being caught up on side goals that do not impact the bottom line.

DON’T – Forget that you are measuring gross revenue, not net revenue

Except in rare circumstances, most groups end up measuring gross revenue when it comes to the impact to the business. While this makes numbers seem much larger than they really are, it often times leads to groups over estimating their impact to the business as a whole. If you cannot express impact based on pure revenue generation, at least make it clear what numbers and assumptions you are using and what you expect the entire program to deliver to the bottom line. Nothing kills credibility then numbers that any rational executive can not believe.

DO – Report on the scale of impact of various tests

So much is missed if we do not look at patterns across tests. One of the critical things for groups to understand is that lift by itself does not tell you revenue. A smaller population measured with a very large lift is often worth far less than a larger population with a much smaller lift. If you are translating all tests to revenue, then you can easily figure out that where you have been able to generate the most revenue, not necessarily the most lift. This active data acquisition is what allows you to plan out and increase resource efficiency in the future, and becomes vital for the long term growth of a program. Often times the lessons learned here really shape how people look at the impact of various channels. This type of analysis also helps people start to understand the differences between revenue allocation and revenue generation.

DON’T – Forget that the most valuable results from tests is not the lift

It is vital that overtime you start to get a deep causal understanding of what your ability to influence various parts of the user experience, as well as user groups, and what the cost to do so is. While it is fun to talk about the revenue impact, knowing that in 4 out of 5 places changing content did not have much of an impact compared to spatial changes completely changes how other parts of the business even operate. These lessons, about where you have been able to make an impact, how, and what it took to do it can help shape entire product roadmaps and help drive exponential revenue generation in the future.

There is no better time to express that testing is not just a list of actions, but an active acquisition of knowledge then when and how you talk about results. Failing to look at these patterns across tests and failure to really use this as a way to filter your other data can lead to massively inefficient uses of resources. Your program is worth far more then the individual actions you take, so why would you allow others to overly focus on tests when it is the act of optimization that drives the largest value opportunities? Make this the focus, what you learned, how, why, and what the impact was, and you will be able to make others see what testing can really do for them.

There is no greater time to really see where a program is at then to see how they communicate results. You can tell how efficient they have been, how they work with other groups, and most of all how much the personal ego of the people involved on both ends of the presentation gets in the way of real meaningful results. If you think about and focus on the right parts of expressing results, you will be able to move forward and really change your organization. Nothing drives others to want to invest in and expand a programs impact if you can show it improves every other part of the business. Focus on just the lift and just numbers, and you are setting yourself and others up for failure.

How Analysis Goes Wrong: The Week in Awful Analysis – Week #7

How Analysis goes wrong is a new weekly series focused on evaluating common forms of business analysis. All evaluation of the analysis is done with one goal in mind: Does the analysis present a solid case why spending resources in the manner recommended will generate additional revenue than any other action the company could take with the same resources. The goal here is not to knock down analytics, it is help highlight those that are unknowingly damaging the credibility of the rational use of data. What you don’t do is often more important then what you do choose to do. All names and figures have been altered where appropriate to mask the “guilt”.

Sometimes the worst mistakes are those that we make when we are trying to impress others with our work. Nothing leads to less credibility then trying to make up numbers or use some form of flawed logic to make it look like our work is the only reason that the organization exists. Often times, these types of mistakes are the ones that we are least aware of. You will find that you can get away with this type of reporting in the short term, but as soon as someone tries to look behind the curtain groups are left without meaningful answers and often spend the rest of their time updating resumes instead of improving analysis. A perfect example of this comes from a very public source, and while I normally would not name the direct example, in this case it is such a well-known and “popular” speaker, that there is little need to masking identities. This week’s awful analysis comes from no other then Avinash and his blog Occam’s Razor.

Analysis – To show how much impact analytics has had, all you need to do is take the current revenue minus the past revenue, times it by the time and divide by the cost and you get the ROI of analytics.

It is often times a good thing for people in analytics to understand how marketing can take data and exploit it for personal gain, but it is something very different when we fall into the exact same traps. There is so much wrong here that I hardly know where to begin, but let’s look at the highest level problems:

1) It attributes 100% of revenue gain to analytics, which ignores the fact that data is sinusoidal, meaning that it goes up and down all the time with no direct interaction. The complexities of the entire marketplace make it nearly useless to use a pre/post type of analysis with any kind of accuracy.

2) It assumes that the same revenue and resources that were spent here would not have been spent elsewhere. The people in these departments are not stupid, and while they may have followed 100% of your suggestions, it doesn’t mean that they would not have gotten more by doing what they would have otherwise. If they would have generated 150% increase, and your suggestions generate 120% increase, you actually lost 30% of revenue, not gained 120%. The same can be said for looking at a test only for what won, and not the difference between what won and what would have won if the test had only been what the original idea was. The difference of the analysis in both cases is the expansion of opportunities, not the original opportunity itself.

3) It focuses on the suggestions, not on the accuracy of the actual work. Some of the worst people with data are analysts, as they have the most opportunity to abuse the data to push their agenda over someone else’s. Nothing is worse for the industry as a whole then analysts who do not understand the limitations of their own opinions and the need for rational uses of data, not stories and meaningless suggestions.

4) The entire point is someone trying to find revenue opportunities only from only passive interaction with data. There is no way to know the influence or cost of an action in passive data (correlative), so why then would you take any credit for additional revenue? This is the classic analyst fallacy of pretending that the case for action and the action to take are both available in the same data. Active (causal) data acquisition is the only way to get these pieces of information, yet we are trying to make a claim that requires this information without it. There is nothing worse for rational uses of data then this obvious an abuse of personal agenda by the analytics team itself.

I have seen this type of analysis done in many different ways over the years. In all cases, it is a clear sign that the person doing the analysis both clearly does not understand what the data they report means, but also that they are only using their position for personal gain and not organizational gain. I believe that data can add rationality and improve performance of organizations in magnitudes, but the only way to do that is for the analyst themselves to rationally use their owndata. It is important that people understand your impact to the bottom line, but there are many ways to do this that do not require false statements and personal agendas.

Analytics can be a powerful tool that can shape organizations, but it can also be a weapon used to push one person’s agenda versus another, with the result being no gain to the organization but more internal politics. We have so much talk about the power of data, and about the potential of big data, and while you can use it to predict things and build tools that leverage it, the reality is that until we have people running programs that are interested in the real impact to the business, that all the “promise” will be nothing but empty air. Just because you have analytics or just because you build a recommendation or a tool that uses data does not inherently mean it is providing value to anyone. Value is the additional growth of revenue in the most efficient way possible; it is not some flashy toy or your own personal agenda. If you want to really see data use expand throughout your organization, then stop abusing it yourself and help others see the real power of being able to explore and exploit information.

Getting Started Checklist – How to Set Your Organization up for Success with Your First Test

As personalization and testing continues to become more and more mainstream you are starting to see a whole slew of groups that are being introduced to testing, or who may believe they have more functional optimization knowledge then reality. So many groups would get be better off if they just avoided a number of common pitfalls that befall new programs. While earlier I put together my list of the top 7 deadly sins of testing (no single success metric, weak leadership, failure to focus on efficiency, testing only what you want, confusing rate and value, and falling for the graveyard of knowledge), I want to instead give you a very quick checklist to make sure that at least your first few efforts in testing are not more painful and far more fruitful then what normally happens.

It does not matter if you have tested before, or are new to testing. What matters instead is how you tackle today’s issues and how you set yourself up to succeed. Breaking down the components of a successful first test allows you see the action items, and allows you to move towards the moments that really make or break a program. Nothing makes everyone’s life harder than starting out testing on the wrong foot, sine everyone will think that is how things are going to happen from then on. With that in mind, if you do nothing but follow this simple checklist, you are going to be far better off.

1) Decide on what you are trying to accomplish – Pick one metric for all your tests, across your entire site or org that defines success. This might be easy or hard, but it is vital. This is especially difficult for people coming from an analytics background who are used to reporting on multiple things and finding stories.

2) Pick one or two pages to start – Do not try to eat the entire elephant in one sitting.

3) Do not focus on your test ideas – You are going to have test ideas in mind, and you are going to want to only talk about and focus resources on that one test. Without fail this is where most groups want to focus, but I cannot stress enough how not important your concept for improvement will be to a successful program.

4) Make sure your first test is very simple – Don’t start trying to do a full redesign of your entire site or completely change a user flow. Pick one page and one thing to start. If you are not sure, then pick a page and design your first test to measure the relative value of all the items on the page.

5) Decide on your initial segments – Make sure all tests have a segment list. Not being focus on a specific segment will make it far easier to find exploitable segments and will start the process of learning even if you do not intend to use them right away.

Here are some basic rules for segments to make your life easier:


• Must be at least 5% of your total population (10% of a smaller site). This is total, not identified traffic
• Must have a comparable measure (you can’t measure new users versus Google users, since there are new Google users).
• Have to be defined PRIOR to the start of any campaign
• Need to cover the 4 big user information types (the listed items are just examples):

Temporal
Day parting
Day of week

Physical
Geo location
Browser
Operating system (basically all the stuff you get from the user agent string)

Referrer
How did the user get to the site
Key word types
Referrers
Channels

Behavior
New/returning
Used internal search before
Purchaser/non purchaser

6) Start building a proper infrastructure – Build out a technical frameworkfor your entire site for testing. This may not be tied to the first test, thought that will be part of it. Get access so that you can run tests on 80% of your pages, using a combination of local and global set-up. A little pain up front will save you from lots of pain later on. It is always best to avoid death by a thousand cuts whenever possible, even if you don’t see that issue immediately.

7) Decide on rules of action – Make sure everyone is very clear on how you are going to call a test and act on winners before you launch your first test.

8) Making sure you are not going to QA tests like you do other parts of your site – So many testing programs are destroyed by letting IT decide on the work flow and on how you are going to QA your tests. You may have to work with your IT group, but it is vital that testing is not owned by your IT group but instead your marketing and product teams.

9) Point your resources towards feasible alternatives, not adding more metrics – Any additional time and additional resources need to be used on the following two things:

A) Adding as many alternatives to the test as possible, traffic permitting
B) Educating and working with groups to make sure they understand why you are testing and how you are going to act.

10) Remember that the most important time for your program is not the first test – Your first test is fun to focus on, but the period after your first test is where you can start to really focus on testing discipline. This is the time that defines whether you get good at just running tests, or if you build a rock star optimization program. So many groups fail because they miss how vital this time is.

There you go, 10 simple steps to make sure that your first moments in testing do not take your program astray. This is hardly the end of the road, but if you simply avoid setting yourself up for failure, then you can really start to look ahead to all the opportunity that is out there.

How Analysis Goes Wrong: The Week in Awful Analysis – Week #6

How Analysis goes wrong is a new weekly series focused on evaluating common forms of business analysis. All evaluation of the analysis is done with one goal in mind: Does the analysis present a solid case why spending resources in the manner recommended will generate additional revenue than any other action the company could take with the same resources. The goal here is not to knock down analytics, it is help highlight those that are unknowingly damaging the credibility of the rational use of data. What you don’t do is often more important then what you do choose to do. All names and figures have been altered where appropriate to mask the “guilt”.

There are many different ways to abuse statistics to get meaning out of data that it really doesn’t have. I often find myself quoting George Box, “All models are wrong, some are useful.” Since “big data” seems to be driving people to more and more “advanced” ways of slicing data, I wanted to start looking at some of the most common ways people misunderstand statistics, or at least what you can do with the information that is presented by the use of various modeling techniques.

The first technique I want to tackle is clustering, or K-means clustering. This type of modeling allows people to divide a group of users into common characteristics. The analysis looks at various dimensions of users who end up doing some task, usually a purchase or total revenue spent, and then statistically derives the defining characteristics of one group that differentiates it from another. This is very similar to decision tree methods as well, but tends to be much more open on how many groups and what dimensions are leveraged.

A typical use of this data would be:

Analysis: Looking at our data set of converters, we were able to build 3 different personas of our users, low value, medium value, and high value users. Based on the fact that high value users interact with facebook campaigns and internal search far more than others, we are going to spend resources to improve our facebook campaigns to get more people there, and look to make internal search more present for users in the other 2 persona groups.

Before we dive in, I feel it is necessary to remind everyone that the point here is not that optimizing facebook campaigns or internal search is right or wrong, only that the data presented in no way leads to that conclusion.

1) Fundamentally one of the problems with clustering is that it only tells you common traits of people in a group. You have no way of knowing if moving people who do not normally interact with your facebook campaigns to do so, if they will in any way behave like those that currently do. Most likely they won’t, and you have no way of knowing if that will generate any revenue at all.

2) There is a major graveyard effect going on, where looking at only those that convert and looking for defining differences avoids looking at the total population and looking at the differences between those that do convert and those that don’t. There is a pretty good chance that people who don’t convert also use internal search.

3) Even if you assume that everything is correct and that the two areas are high valuable, you still don’t have a read on what to change, what the cost to do so is, or even what your ability to influence anything about that group is. You still have no way of saying that this is more valuable then any other action that you could take (including not doing anything).

4) It assumes that just because those are the defining areas, that the place to interact with people is also that same place. It is just as likely that getting people to sign up for email also gets them to look at content on facebook for example.

5) Personas as a whole can be very dangerous, since they create tunnel vision and can lead to groups not looking at exploitability of other dimensions of the same population.

6) At the highest level, it confuses statistical confidence with colloquial confidence. The statistics tell you that these different characteristics are statistically different enough to create a known difference in groups. It in no way tells you that these differences are important to your business or how you change your users behavior.

I am actually a huge fan of statistics and data modeling, but only as a way to think about populations. I get very worried when groups following the results blindly, or do not actively interact with the data to see about important information, like cost and influence. If you have an analysis like this and you know the exploitability of the different dimensions, then you can do another step of analysis and look for size and ability to change the population based on what you know of the exploitability of the defining characteristics. If you do not have that, then the data is interesting but ultimately almost useless.