Category: Segmentation

When Heuristics go Bad – Dealing with Common Optimization Practices – Part 1

Talk to 5 people in the optimization space and you will get 5 different stories about how best to solve your website. Talk with 50 however and those 5 will get repeated more often than not. Such is the world we operate in where “best practices” become so common place and repeated that we often do not take the time to really think about or prove their effectiveness. Because of this phenomenon a lot of actions which are less than ideal or outright bad for companies become reinforced must do items.

The reality is that discipline is going to always win out over specific actions, and that often times the best answer is to measure everything against each other and take nothing for granted. While all of that is true it is still important you understand these common suggestions, where they work, how, why, and more importantly why people believe they are more valuable than they really may be.

Test Free Shipping or Price Changes

This is a real common one for retail sites as it is easy to understand, and a common tactic (thanks Amazon) and one that is easy to sell to the higher ups. The problem is not actually the concept, but how people measure the impact of it, and what that means to other similar tactics. What can easily seem like a huge win is often a massive loss, and even worse due to how most back-end systems are designed the actual amount of work needed to achieve these tests can be much higher than other more simple and extremely valuable uses of your finite resources.

Let’s look at the math of a basic free shipping test. In this simplified scenario, we sell 1 item for $90 dollars on our site, with an actual cost of $70 to us ($20 net profit). Our shipping is $10 dollars, which means that when it is normally purchased someone pays us $100.

We want to test free shipping, where we pay for the shipping and sell the same widget for now $90. We run the test and we have an 50% increase in sales! We should be getting promotions and in most cases the person who ran this project is shouting their accomplishments to the entire world and everyone that will listen. Obviously this is the greatest thing ever and everyone should be doing it… except you just lost a lot of money.

The problem here is that we often confused gross and net profit, especially because in a lot of different tests you are not directly changing the bottom line. In the case of free shipping or pricing tests though, we are directly change what a single sell means to us.

Let’s dive into the numbers of the above. Let’s say that we sell 1000 orders in our control normal group.

$100 X 1000 = $100000

But the real number that impacts the business is:

$20 x 1000 = $20000

In the free shipping option, we have cut our profit in half by paying for the $10 shipping, which means that at $10 profit we actually have to have twice as many orders JUST TO BREAK EVEN.

$20000 / $10 = 2000

This means that if we fall back to the standard RPV reporting that you look at for other types of tests, then the math says that:

$100 X 1000 = $100000
$90 X 2000 = $180000

So any option where we do not increase RPV by at least 180% means we are dramatically losing revenue. So many times you see reports of amazing results from these kinds of optimization efforts which are masking the realities behind the business. It can be hard, no matter how much this makes sense in conversation, to have the discipline to think about a 50% increase as a loss, but that is exactly what happened here. Sadly this hypothetical story plays out often in the real world, with the most likely result being the pushing of the results and not the rational evaluation of the impact to the business.

This same scenario plays out anytime we have varied margin and not as varied gross cost. The other common example is price changes, where the cost of the item remains fixed, but the test is only truly impacting how much margin we make off of the item. In both cases we are forced to set minimum marks prior to starting a test, and treating those as the neutral point, not the normal relative percentage lift that we might be accustomed to.

Always repeat content on your site

This and a large number of other common personalization type suggestions (who to target to and how to target to them) actually have a large number of issues inherent to them. The first is that even if what is suggested is true, it does not mean that it is the most valuable way to tackle the problem. Just because repeating content does improve performance by 3%, it doesn’t mean that doing something else completely will not result in a 10% or 50% increase.

The sad truth is that repeating content, when it does work, is often a very small incremental gain and pails in comparison to many other concepts of content that you could be trying. The goal is not to just do something that produces an outcome as every action produces an outcome, the goal is to find the action that produces the maximum outcome for the lowest amount of resources. In that light repeating content is often but not always a poor use of time and resources. The reason it is talked about is often not due to its performance but because it is easy to understand and easier to get buy-in from above.

The second major problem with these is that they skip the entire discipline that leads to the answer. There is no problem with repeating content as long as you also try 3-4 other completely different forms of content. Repeating content may be the right answer, it may be an ok answer, and it may be the worst answer, but you only know that if you are open to discovering the truth. There is no problem having a certain group or behavior you want to see if you can target to, the issue is when you target to them without looking at the other feasible alternatives. If you are not testing out multiple concepts to everyone and looking at them for the best combination, then no matter what you do you are losing revenue (and making you and your team do extra work).

The real irony of course is that if you test these out in a way to find out the impact compared to other alternatives, the absolutely worst case scenario is that you are correct and you target as you would have liked. Any other scenario presents you either with a piece of content or the group or both that results in better performance. Knowing this information allows you to save time and effort in the future as well as spend resources on actions that are more likely to produce a result.

It is not unusual to find that doing just targeting to a specific group will result in that group showing a slight increase, and if that is all that you look at you would have evidence to present and share internally as success. Looking at the issues deeper you commonly find that the overall impact to the business is negligible (within the standard 2% natural variance) or even worse negative to the whole. It is also not uncommon to find a combination that you never thought of presenting a massive gain.

One of my favorite stories in this line was when I worked with an organization that had decided exactly how and what to target to a number of specific groups based on a very complex statistical analysis of site behaviors. They had built out large amounts of infrastructure to facilitate this exact action. We instead took 100% of the same content they already had and presented it to everyone, looking at the impact to serving it to the groups they envisioned as well as others. We simple took all their existing content and serve it to everyone and also in a few different dynamic permutations. The result showed that if they had done only what they had envisioned they would have lost 18% total leads on the site (this is also a great example of why causal inference is so vital and to not rely on correlative inference). They also found that by serving 2 of their normal pieces of content based on behaviors they had not envisioned they would see a 20% gain. They were able to go from causing dramatic harm to their business to a large meaningful multimillion dollar gain simply by not relying solely on hearsay and instead testing their assumptions.

In both cases there are many different ways you can manipulate the data to look like there was a positive outcome while actually doing damage. In both cases massive amounts of time and effort was spent to try something only to find an outcome counter to people’s assumptions. In both cases testing out assumptions and exploring to discover the value of different actions prior would have better informed and created more value.

In the end, any idea is only going to be as valuable as the system you put it through. There is nothing inherently wrong with either concept as long as they are measured for efficiency and acted on rationally. If you can take a common heuristic and evaluate it properly, there is value to be had. That does not mean that they will act as magical panacea, nor should you plan your program around such flawed simple ideas. Focus on building the proper system and you will be able to provide value no matter what concepts get thrown your way.


Why we do what we do: You are Not so Special — The Forer Effect

While the quest for personalization may be newer to some members of the online marketing world, the reality is that is a concept that is old as sales. People have been trying to convince others that they alone were getting a special deal or that their message was meant just for them. One of the great practitioners of this concept was P.T. Barnum, who famously billed his circus as “we’ve got something for everyone.” On some level everyone understands the appeal of being special and of having someone take the time to tell me something that is unique to just me. The greatest salespeople though understood one of the great ironies of personalization, which is that general statements, when given in context, often are treated as deeply personal and are extremely powerful. This concept is known as the Forer effect, or more directly, the tendency of people to interpret statements as being accurate for them personally, even when they are not.

The Forer effect gets its name from B.R. Forer and came about from a series of experiments that he performed in 1948. His famous study involved giving a personality test to all of his students. He told them that they were all receiving unique personality analysis, and they were to rate that analysis on a scale of 0 to 5. All of the students actually received the exact same results, using such lines as, “While you have some personality weaknesses you are generally able to compensate for them.” and “You also pride yourself as an independent thinker; and do not accept others’ statements without satisfactory proof.“. Despite the exact same statements being made, his students average score for his “analysis” was 4.26.

The favorite trick of psychics, conference speakers, and astrologists, this psychological bias is important to understand, especially when thinking about the concept of personalization. General statements hidden to look like targeted messages have much greater impact than direct statements, and are far more likely to increase belief in the speaker. Personalization, as it turns out, is about not being personal, or at least not to the Nth degree. Personalization is about the match of the general and the pseudo specific, and it is about taking that message to the largest group possible, not just the ones that directly match the message. The more we can measure different types of messages, and the more we can find the largest groups that respond to them, the better our results, since functionally the more targeted the message, the less overall gain we get to improving total site performance.

So you might ask why is this so important in the quest for personalization? This bias tells us that overly thinking personalization and designing a large number of specific messages is both a waste of resources, but also far less likely to create a positive outcome. It also tells us that the message itself does not have to match the rules that dictate the outcome; general statements have an impact for a large variety of people, not just a specific targeted group.

As you start thinking about and tackling your personalization programs, it is important to understand the nature of why you are doing these actions. Barnum knew that he was there to sell his circus, and every action he did had only that outcome in mind. He was one of the most famous practitioners of a single success definition, and he knew that no matter what he did needed to drive more to spend more on his circus. The same is true of all online efforts. Your goal in the end is to make more money, and the key is not to focus on a specific message, or to over rely on experts or correlative information to tell you when and how to target. The key is to test out all sorts of possible content, and to see how you can best present them to people to allow you the efficiency of largest group of people possible.

This is why a message about a specific product may work best for Firefox users, why time of day may be the best match for your re-targeting content, and why smaller segments are so inefficient. It is also one of the main reasons why targeting content without the discovery process of the value is far more likely to lose you revenue than generate more. It turns out the more you try to narrow a message or assume an outcome, the worse your results will be. Somewhat specific messages work for far larger groups than you could ever imagine, and you only know the true power when you let go of your own ego and preconceived notions and explore.

Stop thinking of personalization as trying to build a one on one message with a customer, that does not work and is extremely inefficient. Instead explore the various ways that you can create different content, and then explore who the largest groups are that you can present that to. This means always going through a discovery process of figuring out what matters, and then figuring out for whom. You may want to target to only people who looked at brand X, or page Y, or who have done a come to your site 3 times without purchasing, but that in no way means that you should limit the message to just that group. The less control you exert on the specifics of a message, and the more you are open to new possibilities, the more likely you are to find larger and more meaningful outcomes.

Explore what the value is of different messages and of taking it to different groups. You have powerful tools at your disposal to do just that, to discover and take these more general statements to large groups. From simple A/B tests all the way to automated machine learning, the real key to value comes from how you think about the problems, not in your ability to just find a group and target to it. Not only that, but you have the ability to measure the efficiency of various discoveries and techniques against each other. You are not limited to creating these stories, or just targeting to a specific persona, you have so much more at your disposal if you just allow yourself and others the flexibility to learn and grow.

P.T. Barnum is also famous for how he could get people to pay for anything, with the most famous example being the egress. It wasn’t meant for anyone specific, but he could get just about anyone to fall prey to the mystery. He didn’t have to target that message to just one group, or to offer it for only people who were on their way out, he figured out how to take that message to everyone. He understood that just because a group might be inclined for something, that just limiting your message to that group was a waste of his time. He was the ultimate salesman, but he knew that the key was to make it look like you were walking a fine line and being extremely specific, while at the same time in no way going that far.

So the question comes down, as you explore personalization, or you selling the egress? Or are you the one on your way out that door?

The Difference between Success and Failure with Personalization

Personalization is such a buzzword right now that it is nearly impossible to have a conversation in the digital marketing space without it coming up. Everyone is on this quest for a “personalized experience” or to make sure that they are doing what every other group is doing. You constantly hear about all this new technology and all these new ways to accomplish this task. There is more tools and information about our users now than ever before, and yet there are very few groups or people who actually can differentiate between success and failure for personalization.

The most fundamental thing people forget about “personalization” is that I can “personalize” an experience in almost infinite ways. I can change copy, I can change work flow, I can change layout or features of the experience. Even better, I can do this for the same user in a thousand different ways. I am a returning user to your site… but I am also a user in the afternoon, who came from Google, who has been on the site 12 times, who has made 3 purchases, and who is using FireFox. So the question is not CAN I personalize an experience, at this point there are a thousand different tools and ways to do so. So the simple act of creating an experience is not the goal, the goal is to do so in the way that generates the greatest ROI for my organization.

The question needs to be, how do I discover the most valuable way to change the experience?

What we need to incorporate in any concept of personalization is a way to measure these different concepts against each other. We have to build into every process a period of discovery, using tools that allow us to know the two most valuable pieces of information when it comes to personalization:

What is my ability to change their behavior?

What is the cost to do so?

There is no way to acquire that information without actively making changes and seeing the outcome. Measuring that different groups have different behavior is easy, but what does that tell you about your ability to change that behavior? Just because one group of users purchase twice as often as another, how do you know your ability to change that behavior? How do you know that a different experience will do anything more than a static similar experience for both?

And that is the difference between success and failure when it comes to personalization. Are you just serving up an experience because you can? Or have you done the active acquisition of knowledge that shows not only that it improves performance, but that it is the best way to increase performance.

I want to give a functional example so that you can see this in action. Let’s take the exact same concept and see it executed under both ways of thinking.

Let us say that it is coming up on the holiday season, and you want to serve up a holiday shipping message to people who have purchased on your site before.

If my goal is increased revenue, then the steps would be as follows:

1. Create multiple executions of the message (how do you know if the concept or the execution is the issue with one offer?)

2. Take 2 to 3 other messages that could be used there (one will most likely be your default content), other concepts such as specific products or specific site offerings. Hopefully you are just reusing existing content.

3. Serve all the offers to EVERYONE

4. Look at the results by segment and calculate the total gain by giving a differentiated experience:
i. If you are correct, then the highest performing recipe for the previous shopper segment will be one or both of the shipping messages. Default content would then be the winner for the non-purchaser segment (the comparable segment).
ii. If you are wrong, then any other segment will have a higher winner for any of the offers. Be open to permutation winning that you never thought of. Being wrong is always going to provide the greatest return

5. Push live the highest revenue producing opportunity found

Let us see how groups that get little, no, or negative value from “personalization” do the same task:

1. Push the single piece of creative to the repeat purchaser segment.

2. Hope

See the fundamental problem is that in the second scenario you have no way of knowing if it is valuable, or not. Blind belief that you are providing value is not the same as providing value. Most groups think that if they just report the outcome, or the rate of action of that group, that it somehow represents the value of that action. It doesn’t. Value only comes from the improvement of performance by that action. If you aren’t actively acquiring that information, then you have no way of knowing the value of any action. Even worse, we are adding cost and we suffering from opportunity cost from the gain we should be getting.

I want to show some simply math to show you the difference in the two groups. Let us say in the first test we have the 5 different experiences and that we are looking only at 10 different comparable segment groups (segments only matter if there is a different outcome for the comparable group). This might include things like new/returning, work hours/non-work hours, search/non search, Firefox/chrome/Internet Explorer, or any other of the infinite ways of dividing your users using any and all of the information that is available to you. You can always do more, but for the sake of argument and of efficiency, 10 different pools of the same population is enough. Segments are only valuable for targeting if we serve things to the comparable segment. If I assume that everything is purely random, then I have a 1 in 5 chance of my offer being the best. I also have a 1 in 10 chance of my segment being the MOST valuable.

(1/5) * (1/10) = 2%

So if everything is random, then I have a 2% chance that I picked the best outcome (the one that drives the highest revenue for my site), which means that in 98% of the scenarios, I have cost my site money. But let’s assume that you are REALLY good at picking segments and content based on your experience and your analysis. Having worked with nearly 300 different organizations, experience shows that the best of people who aren’t relying on causal data are no better than 2 times random guesses for choosing a better option (they guess a right answer twice as often than just the random sample).

Most groups do not fall into that category. In reality, most groups actually are worse than random at choosing the best option.

That means the math is only:

(2/5) * (2/10) = 8%

Let’s say you are the best person in the world at what you do, with great analysis and all sorts of tools, so that you are three times better:

(3/5) * (3/10) = 18%

So if you are absolutely amazing at what you do, then 18% of the time, you will have guessed the right message for the right group. 82% of the time, another outcome is better and most likely significantly better. You can reduce that to 0% of the time a better performing option with a few simple steps and accepting that we do not always understand the patterns before us. If we go back to random chance, then 20% of the time just doing nothing (your default offer) actually performs better for everyone. If you are the betting type, which would you take? 8% versus 100%? Especially when the scale of impact can be massive.

Remember that in all scenarios you are going to get an outcome, so that can’t be the measure of success. The process of finding the right answer is far more important than a conversation around the function of a tool. Nor can discussing only the impact of one segment, since we are not comparing it to others in context. The question is did doing this one thing provide MORE value than doing another action (or doing nothing), and the only way to answer that is to compare outcomes. All of the downside is when you look at “personalization” as just a function that you make a decision on and just do. All of the upside is when you discover value and then exploit it. There is nothing more valuable then when you are wrong, but the only way to discover that is when you create a system that enables it.

The difference between a success and a failure with personalization comes down to this:

If the goal is to make money, the question to ask is not to ask CAN I do personalization, but how do I put steps in place to ensure that I am having both a discover and an exploitation phase to my actions?

Change the Conversation: What does “Efficiency” really mean?

One of the great mysteries of the analytics space is the use of words that have almost no real meaning. Words like optimization, analytics, marketing, social, value, personalization, predictive, and segment have different meanings to different people. They become useful jargon to direct a conversation, but when it comes down to giving them a real meaning, so many groups struggle because it is a very personal definition. When we do find a meaning for those words, it is usually an old tired one that has lost all relevance in the modern world. To me, the most commonly abused term is efficiency.

What does efficiency mean? Is it just an outcome? Is it something that you can actually measure? If it is as simple as just ROI, why do we fail then to really measure against it? I want to present a simple way to think about efficiency, your actions towards improving it, and then give you real world ways to use that to measure your actions and to improve the “efficiency” of your organization.

Here is the way I suggest measuring efficiency:

This gives you a value, which you can then measure against others. The difference between values shows you what is efficient and what is not efficient. It is strongly related to ROI, but separates its components and allows you to look at any action, not just revenue. We are given the choice to interact with 1 or all 3 parts, and we can measure our ability to do so on the same scale.

It is important to understand the 3 components, to make sure that everyone is on the same page.

Scale – The size of the population that is impacted.
Impact – This is the measure of recordable lift or gain. This is your ability to influence. This must be towards a site wide goal, not just a dependent goal such as the next page or clicks.
Cost – This is how much in time, energy, money or other resources it takes to acquire and maintain the impact listed above.

To do this however, you must always keep all three things in mind, not just one.

Scale reminds us that a high increase of a small group is often less valuable then a small increase to a large group. We can try to increase the scale of something, but without knowing the impact or the cost to achieve it, we have nothing.

Impact reminds you that you can’t only look at lift. If you hear that you got a 12% lift, then you are still missing two really important pieces of information. If the 12% is to 100 people or 100,000 people, it dramatically changes the outcome.

The cost to achieve those two pieces tells us if we actually did something valuable or not. If it takes you 2 hours and $20 to achieve this outcome, or if takes you 6 months, 500 man hours, 1.2 million in new products and has a long term maintenance cost, then it is not going to be as valuable.

In order to enforce a conversation around maximizing return, you must first change the conversation so that you are no longer discussing only one of these metrics at a time. Do not accept a conversation that only tells you lift, or that only tells you a population without knowing the ability to impact that group and the cost to achieve that change. Do not just blindly hear that you have likelihood to change a metric, understand that you have to know the cost and scale of doing so. Do not just hear that a group has a different behavior, understand that you need to know the scale of impact and the cost to change them to understand the efficiency of that action.

So this may seem like a very simple definition for a complex issue, but it gives you the ability to truly view the world differently. To quote Jim Horning, “Nothing is as simple as we hope it will be.” We like to pretend we think these things all the time and that they are obvious in every conversation, yet time and time again we drop the entire context in the name of pushing an agenda. There are hundreds of conversations every day that talk about metrics that have nothing to do with improving performance (e.g. bounce rate) or that only talk about a single portion of performance (lift). Stop those conversations, and remind people that reducing costs or increasing scale is just as effective as improving your impact. Do not assume that everyone is putting everything in the right context, because they aren’t.

So what is efficiency? It is simply the act of making sure that you are improving this ratio, and you are remembering that you can not look at only one aspect to answer a question. We can’t fail to measure actions against each other. These are not just isolated events. It is acting in a way that you keep both the denominator and numerator equivalent in your discussions and actions, and that you do everything in your power to reduce low value actions and increase high value actions. Once you have an action, measure it against other actions, and continue to balance the discovery of the value of actions against your exploitation of the higher value ones.

Being efficient is simply taking resources away from low value actions and towards high value actions. The very concept implies that you will stop doing certain actions and that you will do new ones you aren’t currently doing. It is the entire discipline of knowing that what you are doing today is wrong, and that there is always a better way to do things.

It is not the concept but the constant discipline of following it and holding yourself and others accountable that will truly define your outcome. Nothing here is revolutionary, other than eliminating all of the other factors and excuses people love to throughout in their arguments. It gives you the way to measure different outcomes against each other, and because of that, you can truly see what the value of various actions are against each other.

If you are disciplined in your tracking, honest in your impact, and willing to evaluate actions as how they help your site, and not just you, you will arrive at amazing conclusions that will shift your organization. The only way to improve is to change, so do not fear change, embrace it. Do things you aren’t sure about, challenge common thinking, do the exact opposite to see what the value of what you are doing really is. Measuring things in this simple a form is not sexy or “advanced”, and it can seem juvenile, but it is only by doing the small things well that you will ever succeed at all those large things people promise revolutionize the world.