When Heuristics go Bad – Dealing with Common Optimization Practices – Part 1
Talk to 5 people in the optimization space and you will get 5 different stories about how best to solve your website. Talk with 50 however and those 5 will get repeated more often than not. Such is the world we operate in where “best practices” become so common place and repeated that we often do not take the time to really think about or prove their effectiveness. Because of this phenomenon a lot of actions which are less than ideal or outright bad for companies become reinforced must do items.
The reality is that discipline is going to always win out over specific actions, and that often times the best answer is to measure everything against each other and take nothing for granted. While all of that is true it is still important you understand these common suggestions, where they work, how, why, and more importantly why people believe they are more valuable than they really may be.
Test Free Shipping or Price Changes
This is a real common one for retail sites as it is easy to understand, and a common tactic (thanks Amazon) and one that is easy to sell to the higher ups. The problem is not actually the concept, but how people measure the impact of it, and what that means to other similar tactics. What can easily seem like a huge win is often a massive loss, and even worse due to how most back-end systems are designed the actual amount of work needed to achieve these tests can be much higher than other more simple and extremely valuable uses of your finite resources.
Let’s look at the math of a basic free shipping test. In this simplified scenario, we sell 1 item for $90 dollars on our site, with an actual cost of $70 to us ($20 net profit). Our shipping is $10 dollars, which means that when it is normally purchased someone pays us $100.
We want to test free shipping, where we pay for the shipping and sell the same widget for now $90. We run the test and we have an 50% increase in sales! We should be getting promotions and in most cases the person who ran this project is shouting their accomplishments to the entire world and everyone that will listen. Obviously this is the greatest thing ever and everyone should be doing it… except you just lost a lot of money.
The problem here is that we often confused gross and net profit, especially because in a lot of different tests you are not directly changing the bottom line. In the case of free shipping or pricing tests though, we are directly change what a single sell means to us.
Let’s dive into the numbers of the above. Let’s say that we sell 1000 orders in our control normal group.
$100 X 1000 = $100000
But the real number that impacts the business is:
$20 x 1000 = $20000
In the free shipping option, we have cut our profit in half by paying for the $10 shipping, which means that at $10 profit we actually have to have twice as many orders JUST TO BREAK EVEN.
$20000 / $10 = 2000
This means that if we fall back to the standard RPV reporting that you look at for other types of tests, then the math says that:
$100 X 1000 = $100000
$90 X 2000 = $180000
So any option where we do not increase RPV by at least 180% means we are dramatically losing revenue. So many times you see reports of amazing results from these kinds of optimization efforts which are masking the realities behind the business. It can be hard, no matter how much this makes sense in conversation, to have the discipline to think about a 50% increase as a loss, but that is exactly what happened here. Sadly this hypothetical story plays out often in the real world, with the most likely result being the pushing of the results and not the rational evaluation of the impact to the business.
This same scenario plays out anytime we have varied margin and not as varied gross cost. The other common example is price changes, where the cost of the item remains fixed, but the test is only truly impacting how much margin we make off of the item. In both cases we are forced to set minimum marks prior to starting a test, and treating those as the neutral point, not the normal relative percentage lift that we might be accustomed to.
Always repeat content on your site
This and a large number of other common personalization type suggestions (who to target to and how to target to them) actually have a large number of issues inherent to them. The first is that even if what is suggested is true, it does not mean that it is the most valuable way to tackle the problem. Just because repeating content does improve performance by 3%, it doesn’t mean that doing something else completely will not result in a 10% or 50% increase.
The sad truth is that repeating content, when it does work, is often a very small incremental gain and pails in comparison to many other concepts of content that you could be trying. The goal is not to just do something that produces an outcome as every action produces an outcome, the goal is to find the action that produces the maximum outcome for the lowest amount of resources. In that light repeating content is often but not always a poor use of time and resources. The reason it is talked about is often not due to its performance but because it is easy to understand and easier to get buy-in from above.
The second major problem with these is that they skip the entire discipline that leads to the answer. There is no problem with repeating content as long as you also try 3-4 other completely different forms of content. Repeating content may be the right answer, it may be an ok answer, and it may be the worst answer, but you only know that if you are open to discovering the truth. There is no problem having a certain group or behavior you want to see if you can target to, the issue is when you target to them without looking at the other feasible alternatives. If you are not testing out multiple concepts to everyone and looking at them for the best combination, then no matter what you do you are losing revenue (and making you and your team do extra work).
The real irony of course is that if you test these out in a way to find out the impact compared to other alternatives, the absolutely worst case scenario is that you are correct and you target as you would have liked. Any other scenario presents you either with a piece of content or the group or both that results in better performance. Knowing this information allows you to save time and effort in the future as well as spend resources on actions that are more likely to produce a result.
It is not unusual to find that doing just targeting to a specific group will result in that group showing a slight increase, and if that is all that you look at you would have evidence to present and share internally as success. Looking at the issues deeper you commonly find that the overall impact to the business is negligible (within the standard 2% natural variance) or even worse negative to the whole. It is also not uncommon to find a combination that you never thought of presenting a massive gain.
One of my favorite stories in this line was when I worked with an organization that had decided exactly how and what to target to a number of specific groups based on a very complex statistical analysis of site behaviors. They had built out large amounts of infrastructure to facilitate this exact action. We instead took 100% of the same content they already had and presented it to everyone, looking at the impact to serving it to the groups they envisioned as well as others. We simple took all their existing content and serve it to everyone and also in a few different dynamic permutations. The result showed that if they had done only what they had envisioned they would have lost 18% total leads on the site (this is also a great example of why causal inference is so vital and to not rely on correlative inference). They also found that by serving 2 of their normal pieces of content based on behaviors they had not envisioned they would see a 20% gain. They were able to go from causing dramatic harm to their business to a large meaningful multimillion dollar gain simply by not relying solely on hearsay and instead testing their assumptions.
In both cases there are many different ways you can manipulate the data to look like there was a positive outcome while actually doing damage. In both cases massive amounts of time and effort was spent to try something only to find an outcome counter to people’s assumptions. In both cases testing out assumptions and exploring to discover the value of different actions prior would have better informed and created more value.
In the end, any idea is only going to be as valuable as the system you put it through. There is nothing inherently wrong with either concept as long as they are measured for efficiency and acted on rationally. If you can take a common heuristic and evaluate it properly, there is value to be had. That does not mean that they will act as magical panacea, nor should you plan your program around such flawed simple ideas. Focus on building the proper system and you will be able to provide value no matter what concepts get thrown your way.