Category: Methodology

The Road to Greatness: The Do’s and Don’ts of Starting an Optimization Program

As more and more programs start to emerge with the growth of the online optimization field, there becomes a preponderance of “best practices” when it comes to testing, personalization, and all other active forms of leveraging data, it seems like you have to know a massive amount to just understand completely what those “experts” are saying.

With that in mind, I wanted to present some very simple do’s and don’ts for programs just getting going. Starting correctly and setting the stage for success is vital to really being efficient and getting value out of your program that you can and should. The problem is that in almost all cases people’s first instincts lead them astray. What you don’t do is more important usually then what you do choose to do. The key is to make sure that you focus your limited time on the actions that will provide the greatest growth and value to your program. The same advice can work for groups that have been testing for years as many of those programs also are just built up versions of the same bad behaviors.

DO – Hold discussions about a single success metric

The very first and sometimes the most painful hurdle that a program faces is getting groups to agree on how to act. This is in many ways completely counter culture as many groups have competing goals and are only focused on their little piece of the larger pie. If you do nothing else, getting people to agree on the one thing that you can all make a decision on is vital.

A side benefit of this conversation is that it starts the process of allowing people to dissociate the actions they think will lead to success and the actual measure of success. Way too many people think that if their idea is more people looking at product X will generate additional revenue then the measure of success is more people looking at product X. You may have an idea for what you want to do, but you are doing it to accomplish a goal, so measure the goal, not the action. The measure of success would be additional revenue, and once that is the only goal, you can start comparing all feasible ways to achieve that exact goal.

DON’T – Get too caught up on test ideas

Some of the least important parts of a testing program is the generation of test ideas. While this is the fun part for people to try and prove their point, the keys to success are not in having a bunch of ideas, but in putting together the infrastructure and helping understand the discipline of successful testing. Test ideas will come naturally out of everyday conversations and especially out of prior tests and learned knowledge. There is never a lack of things you can do, but focusing too much on that part allows people to get caught up in many different biases which will make their rational evaluation of the results to collide with their ego.

DO – Apply tech resources on a larger infrastructure

All tools require some sort of deployment, and while some are easier than others, the biggest mistake you can do is to think that every test will require massive amount of resources. If you build a proper infrastructure across your site, then most tests will not require any involvement from development resources whatsoever.

The key to a good infrastructure is to have tagging in the key locations on your top pages so that you can test just about anything. You will also need to make sure that you have tracking in place for your success metric, and for any additional information (like segment information) that you may want to provide.

Testing should not be thought of as a project but as an ongoing organization and site feature. It is something that should be set-up in a way to never stop and to never be about the simple validation of a single idea. In order to maximize this, going through the initial “pain” of a larger deployment and making sure that your IT group understands that this is not a permanent engineering owned project will dramatically improve your ability to move quickly later on. The key once this is done is to prioritize tests based on resource usage and prioritize tests that will deliver the greatest return for the lowest resource usage.

DON’T – Think testing is just an extension of your analytics group

How you think about optimization is almost the exact opposite of analytics. Instead of patterns and anomalies of larger data sets, you have a single point and the push to make consistent meaningful changes. Testing is not just the action arm of some analysis you did to validate your point, it is the active acquisition and interaction with data.

To succeed, you need to think about segmentation differently. You need to think about what a success metric really is and how it is different in testing. You need to able to speak in terms of comparative analysis, not validation. Basically, you have to be able to turn just about everything you do with analytics on its side. Later on, you can start leveraging the two together, but as you start, separating them completely is going to grant you far more return with far less work then trying to just tack testing into your analytics daily activities.

DO – Think about your rights management

Make sure you know who is going to have what rights and make sure you have some checks in place from too many people changing your site.

DON’T – Blindly follow statistical measures

You don’t need to know everything about all statistics, but you do need to understand some basic concepts to really understand results. The first is that for any statistical tool to be useful, you need not just statistical confidence, but you also need the data to be representative of the change you are going to make. If you get 99% confidence in 3 hours on a Friday afternoon, that data is only representative of that period of Friday afternoon.

DO – Starting thinking how you are going to store and share results

When you are testing right, you are going to constantly learn new things and if you are doing your testing right these lessons you learn will eventually be far more valuable than any individual result. You need to start thinking about where you are going to share this information, the format, and the availability. You also need to make sure that this is not a static item but a living knowledge base.

DON’T – Let any test go out with just two recipes

One of the hardest lessons to learn is that testing is not about validation of a single point, but about comparing feasible alternatives and being prepared to go in directions that you never imagined. While not everyone will be ready for this day one, the simplest way to prepare people is to force discipline on them. Making people have multiple very different but feasible alternatives will start giving you far more information and will start to show you areas where what they thought mattered didn’t.

There are a thousand other things that go into running a program, but just starting out, if you tackle these simple things and avoid some common traps, and then you will be get far more results, make a larger impact to your business, and use far less resources. Think about what you really want from your program and then stop focusing on individual tasks and instead start putting the key pieces in place for long term success.

7 deadly sins of testing – Not Understanding Your Data

It doesn’t take long working in a data field for you to come across data being used in ways other than what it was intended for. George Canning once correctly quipped, “I can prove anything by statistics except the truth.” One of the hardest struggles for anyone trying to make sense of all the various data sources is an understanding of the data that you are dealing with, what is it really telling you, what is it not telling you, and how should you act. We have all this rich interesting information, but what is the right tool for the job? What is the right way to think about or leverage that data? One of the ways that testing programs lose value over time is when they stop evaluating their data with a critical eye and focus on what is it really telling you. They so want to find meaning in things that they convince themselves and others of answers that the data could not ever provide. Understand your data, understand the amazing power that it can provide, and understand the things it cannot tell you.

Every tool has its own use, and we get the most value when we use tools in the correct manner. Just having a tool does not mean it is the right fit for all jobs. When you come from an analytics background, you naturally look to solve problems with your preferred analytics solutions. When you come from a testing background, you naturally look for testing as the answer to all problems. The same is true for any background, as the reality is when we are not sure, you are wired to turn back to what you are comfortable with. The reality is that you get more value when you leverage each tool correctly, and the fastest way to do that is to understand what the data does and does not tell you from each tool.

Analytics is the world of correlative patterns, with a single data stream that you can parse and look backwards at. You can find interesting anomalies, compare rates of action, and build models based on large data sets. It is a passive data acquisition that allows you to see where you have been. When used correctly, it can tell you what is not working and help you find things that you should explore. What you can not do is tell the value of any action directly, nor can it tell you what the right way to change things is.

Testing is the world of comparative analysis, with only a single data point available to identify patterns. It is not just a random tool to throw one option versus another to settle an internal argument, but instead a valuable resource for active acquisition of knowledge. You can change part of a user experience and you can see its impact on an end goal. What you can not do is answer “why?” with a single data point, nor can you attribute correlated events to your change to each other. You can add discipline and rigor to both to add more insight, but at its core all testing is really telling you is the value of a specific change. It is beholden on you for the quality of the input, just as your optimization program is beholden on the discipline used in designed and prioritizing opportunities.

Yet without fail people look at one tool and claim it can do the other, or that the data tells them more then it really does. Whether it is the difference in rate and value, or it is believing that a single data point can tell you the relationship between two separate metrics. Where we make mistakes is in thinking that the information itself tells you the direction of the relationship of that information, or the cost of interacting with it. This is vital information for optimization, yet so often groups pretend they have this information and make suboptimal decisions.

We also fail to keep perspective on what the data actually represents. We get tunnel vision on what the impact is to a specific segment or group that we lose the view on what the impact to the whole is. To make this even worse, you will find groups targeting or isolating traffic, such as only new users, to their tests and extrapolating the impact to the site as a whole. It does not matter what our ability to target to a specific group is unless that change will create a positive outcome for the site. The first rule of any statistics is that your data must be representative. Another of my favorite quotes is, “Before look at what the statistics are telling you, you must first look at what it is not telling you.”.

Tools do not understand the quality of the inputs, it is up to the user to know when they have biased results or they do not. Always remember the truth about any piece of information, “Data does not speak for itself – it needs context, and it needs skeptical evaluation”. Failure to do so invalidates the data’s ability to make a the best decision. Data in the online world has specific challenges that just sampling random people in the physical world does not have to account for. Our industry is littered with reports of results or of best practices that ignore these fundamental truths about tools. It is so much easier to think you have a result and manipulate data to meet your expectations then it is to have discipline and to act in as unbiased a way as possible. When you get this tunnel vision, both in what you analyze or in the population you leverage, you are violating these rules and leaving the results highly questionable. Not understanding the context of your data is just as bad or worse then not understanding the nature of your data.

The best way to think about analytics is as a doctor uses data. You come for a visit, he talks to you, you give him a pattern of events (my shoulder hurts, I feel sick, etc..). He then uses that information to reduce what he won’t do (if your shoulder hurts, he is not going to x-ray your knee or give you cough medicine). He then starts looking for ways to test that pattern. Really good doctors use those same tests to leave open the possibility that something else is the root cause (maybe a shoulder exam shows that you have back problems). Poor doctors just give you a pain pill and never look deeper into the issue. Knowing what data cannot tell you greatly increases the efficiency of the actions you can take, just as knowing how to actively acquire the information need for the right answers, and how to act on that data, improves your ability to find the root cause of your problems.

A deep understanding of your data gives the ability to act. You may not always know why something happens, but you can act decisively if you have clear rules of action and you have an understanding of how data interacts with the larger world. It is so easy to want to have more data, or to want to create a story that makes it easier for others to understand something. It is not that these are wrong, only that the data presented in no way actually validates that story nor could provide the answers that you are telling others that it does. In its worse, you are distracting from the real issue, at its best, it is just additional cost and overhead to action.

The education of others and the self on the value and uses of data is vital for long term growth of any program. If you do not understand the real nature of your data, then you are subject to biases which remove its ability to be valuable. There are thousands of misguided uses of data, all of which are easy to miss unless you are more interested in the usage of data then the presentation and gathering of data. Do not think that just knowing how to implement a tool, or knowing how to read a report, tells you anything about the real information that is present in it. Take the time to really evaluate what the information is really representing, and to understand the pros and cons of any manipulation you do with that data. Just reading a blog or hearing someone speak at a conference does not give you enough information to understand the real nature of tools at your disposal. Dive deep into the world of data and the disciplines of it, choose the right tools for the job, and then make sure that others are as comfortable with that information as you are. It can be difficult to get to those levels of conversations or to convince others that they might be looking at data incorrectly, but those moments when you succeed can be the greatest moments for your program.

The Difference between Success and Failure with Personalization

Personalization is such a buzzword right now that it is nearly impossible to have a conversation in the digital marketing space without it coming up. Everyone is on this quest for a “personalized experience” or to make sure that they are doing what every other group is doing. You constantly hear about all this new technology and all these new ways to accomplish this task. There is more tools and information about our users now than ever before, and yet there are very few groups or people who actually can differentiate between success and failure for personalization.

The most fundamental thing people forget about “personalization” is that I can “personalize” an experience in almost infinite ways. I can change copy, I can change work flow, I can change layout or features of the experience. Even better, I can do this for the same user in a thousand different ways. I am a returning user to your site… but I am also a user in the afternoon, who came from Google, who has been on the site 12 times, who has made 3 purchases, and who is using FireFox. So the question is not CAN I personalize an experience, at this point there are a thousand different tools and ways to do so. So the simple act of creating an experience is not the goal, the goal is to do so in the way that generates the greatest ROI for my organization.

The question needs to be, how do I discover the most valuable way to change the experience?

What we need to incorporate in any concept of personalization is a way to measure these different concepts against each other. We have to build into every process a period of discovery, using tools that allow us to know the two most valuable pieces of information when it comes to personalization:

What is my ability to change their behavior?

What is the cost to do so?

There is no way to acquire that information without actively making changes and seeing the outcome. Measuring that different groups have different behavior is easy, but what does that tell you about your ability to change that behavior? Just because one group of users purchase twice as often as another, how do you know your ability to change that behavior? How do you know that a different experience will do anything more than a static similar experience for both?

And that is the difference between success and failure when it comes to personalization. Are you just serving up an experience because you can? Or have you done the active acquisition of knowledge that shows not only that it improves performance, but that it is the best way to increase performance.

I want to give a functional example so that you can see this in action. Let’s take the exact same concept and see it executed under both ways of thinking.

Let us say that it is coming up on the holiday season, and you want to serve up a holiday shipping message to people who have purchased on your site before.

If my goal is increased revenue, then the steps would be as follows:

Discovery
1. Create multiple executions of the message (how do you know if the concept or the execution is the issue with one offer?)

2. Take 2 to 3 other messages that could be used there (one will most likely be your default content), other concepts such as specific products or specific site offerings. Hopefully you are just reusing existing content.

3. Serve all the offers to EVERYONE

4. Look at the results by segment and calculate the total gain by giving a differentiated experience:
i. If you are correct, then the highest performing recipe for the previous shopper segment will be one or both of the shipping messages. Default content would then be the winner for the non-purchaser segment (the comparable segment).
ii. If you are wrong, then any other segment will have a higher winner for any of the offers. Be open to permutation winning that you never thought of. Being wrong is always going to provide the greatest return

Exploitation
5. Push live the highest revenue producing opportunity found

Let us see how groups that get little, no, or negative value from “personalization” do the same task:

1. Push the single piece of creative to the repeat purchaser segment.

2. Hope

See the fundamental problem is that in the second scenario you have no way of knowing if it is valuable, or not. Blind belief that you are providing value is not the same as providing value. Most groups think that if they just report the outcome, or the rate of action of that group, that it somehow represents the value of that action. It doesn’t. Value only comes from the improvement of performance by that action. If you aren’t actively acquiring that information, then you have no way of knowing the value of any action. Even worse, we are adding cost and we suffering from opportunity cost from the gain we should be getting.

I want to show some simply math to show you the difference in the two groups. Let us say in the first test we have the 5 different experiences and that we are looking only at 10 different comparable segment groups (segments only matter if there is a different outcome for the comparable group). This might include things like new/returning, work hours/non-work hours, search/non search, Firefox/chrome/Internet Explorer, or any other of the infinite ways of dividing your users using any and all of the information that is available to you. You can always do more, but for the sake of argument and of efficiency, 10 different pools of the same population is enough. Segments are only valuable for targeting if we serve things to the comparable segment. If I assume that everything is purely random, then I have a 1 in 5 chance of my offer being the best. I also have a 1 in 10 chance of my segment being the MOST valuable.

(1/5) * (1/10) = 2%

So if everything is random, then I have a 2% chance that I picked the best outcome (the one that drives the highest revenue for my site), which means that in 98% of the scenarios, I have cost my site money. But let’s assume that you are REALLY good at picking segments and content based on your experience and your analysis. Having worked with nearly 300 different organizations, experience shows that the best of people who aren’t relying on causal data are no better than 2 times random guesses for choosing a better option (they guess a right answer twice as often than just the random sample).

Most groups do not fall into that category. In reality, most groups actually are worse than random at choosing the best option.

That means the math is only:

(2/5) * (2/10) = 8%

Let’s say you are the best person in the world at what you do, with great analysis and all sorts of tools, so that you are three times better:

(3/5) * (3/10) = 18%

So if you are absolutely amazing at what you do, then 18% of the time, you will have guessed the right message for the right group. 82% of the time, another outcome is better and most likely significantly better. You can reduce that to 0% of the time a better performing option with a few simple steps and accepting that we do not always understand the patterns before us. If we go back to random chance, then 20% of the time just doing nothing (your default offer) actually performs better for everyone. If you are the betting type, which would you take? 8% versus 100%? Especially when the scale of impact can be massive.

Remember that in all scenarios you are going to get an outcome, so that can’t be the measure of success. The process of finding the right answer is far more important than a conversation around the function of a tool. Nor can discussing only the impact of one segment, since we are not comparing it to others in context. The question is did doing this one thing provide MORE value than doing another action (or doing nothing), and the only way to answer that is to compare outcomes. All of the downside is when you look at “personalization” as just a function that you make a decision on and just do. All of the upside is when you discover value and then exploit it. There is nothing more valuable then when you are wrong, but the only way to discover that is when you create a system that enables it.

The difference between a success and a failure with personalization comes down to this:

If the goal is to make money, the question to ask is not to ask CAN I do personalization, but how do I put steps in place to ensure that I am having both a discover and an exploitation phase to my actions?

Change the Conversation: Defining Success

One of the more common refrains I hear as I speak with different organizations or read industry blogs, is how do you deal with a failed test? People speak of this as if it is a common or accepted practice, one that you need to help people understand before you move forward. The irony of these statements is that when most groups are speaking, they are measuring the value of the test by if they got a “winner”, a recipe that beat their control. People almost always come into testing with the wrong idea of what a successful test is. Change what what success means, and you will be able to change your entire testing program.

Success and failure of any test is determined before you launch the test, not by the measurement of one recipe versus another. A successful test may have no recipe beat the control, and an unsuccessful test may have a clear single winner. Success is not lift, because lift without context is nice but almost meaningless.

Success is putting an idea through the right system, which enables you find out the right answers and that allows you to get performance that you would not have otherwise. If all you do is test one idea versus another that you were already considering, you are not generating lift, you are only stopping negative actions. In addition, if I find something that beats control by 5%, that sounds great, until you add context that if I had tested 3 other recipes, they would result in a 10%, 15%, and 25% change. Do you reward the 5% gain, or the 20% opportunity loss?

In the long run, a great idea poorly executed will never beat a mediocre idea executed correctly.

You can measure how successful a test will be by asking some very simple questions before you consider running the test:

1) Are you prepared to act on the test? – Do you know what the metric you are using is? Do you have the ability to push results? Is everyone in agreement before you start that no matter what wins, you will go with it? Do you know what the rules of action are and when you can call a winner and when is too soon? If you answered no to any of those questions, then any test you run is going to be almost meaningless.

2) Are you challenging an assumption? – This means that you need to make sure that you can see not only if you are correct, but if you are wrong. It also means that you need to have more than 1 alternative in a test. Alternatives need to be different from each other and allow for an outcome outside of common opinion to take hold. Consider any test with a single alternative to be a failure as there is no way to get a result with context.

3) Are you focusing on should over can?– This is when we get caught up on can we do a test, can we target to a specific group, or making sure that we can track 40 metrics. It is incredibly easy to get lost in the execution of a campaign, but the reality is that most of the things we think are required aren’t, and if we can not tie an action back to the goal of a test, then there is no reason to do it. These items should be in consideration based on your infrastructure, and based on value. Prioritize campaigns by how efficient they are to run, and never include more then you need to take the actions you need to take. Any conversation that you are having that is focused purely on the action is both inefficient and a red herring taking you away from what matters.

So how then do you make sure that you are getting success from a test? If nothing else, you need to build a framework for what will define a successful test, and then make sure that all actions you take fill that framework. Getting people to agree to these rules can seem extremely difficult at first, but having the conversation outside of a specific test and making it a requirement that they follow them will help ensure that your program is moving down the right path to success.

Here is a really simple sample guideline to make sure all tests you run will be valuable. Each organization should build their own, but they will most likely be very similar:

  • At least 4 recipes
  • One success metric that is site wide, same as other tests, and directly tied to revenue
  • No more than 4 other metrics, and all of these must be site wide and used in multiple tests
  • Everyone in agreement on how to act with results
  • Everyone prepared to do a follow-up test based on the outcome
  • At least 7 segments and no more than 20, with each segment at least 5-7% of your population and all must have a comparable segment
  • If interested in targeting, test must be open to larger population and use segments to either confirm beliefs or to prove yourself wrong. (e.g. if I want to target to Facebook users, I should serve the same experiences to all users and if I am right, then the content I have for Facebook users will be the highest performer for my Facebook segment).

One of the most important things that an optimization program can do is make sure that all tests follow a similar framework. Success in the long run follows from how you approach the problem, not by the outcome of a specific action. You will notice that in no point here is the focus on the creation of test ideas, which is where most people spend way too much time. Any test idea is only as good as the system by which you evaluate it. Tests should never be about my idea versus yours, but instead about the discovery and exploitation of comparative information, where we can figure out what option is best, not if my idea is better than yours.

What variant won, whose idea was it, and generating test ideas are some of the biggest red herrings in testing programs. You have to be able to move the conversation away from the inputs, and instead focus people on the creation of a valuable system by which you filter all of that noise. Do not let yourself get caught in a trap of being reactive, instead proactively reach out and help groups understand how vital it is that we follow this type of framework.

Change the conversation, change how you measure success, and others will follow. Keep having the same conversation or let others dictate how you are going to act, and you will never be able to prove success.