I read a post by a famous blogger today that brought up one of least favorite things about our industry. One of the funniest things in our industry is the over reliance on statistical measures to prove that someone is “right”. Whether it is Z-Score, T-Test, Chi-Squared or other measures, and people love to throw them out and use them as the end-all be-all of confirmation that they and they alone are correct. Besides the hubris of the situation, it is just a new tool for people to abuse and not understand.
What is funny about this is they are some of the worst true measures of impact and importance of data (or “who is correct”). They work great in a controlled setting, and with infinitum data, but in the real world, they are just one of many imperfect standards for measuring the impact of data and changes. You cannot replace understanding patterns,looking at the data, and understanding its meaning. I would settle for people understanding what those tests even tell you (hint, it is not 95% confident you will get a 10%lift).
It is like someone found this new fancy tool, and suddenly has to apply it because they realize that what they were previously doing was wrong, but now, this one thing, will suddenly make them perfect.
For the record, three really simple steps to measure impact of changes:
1) Look at performance over time – Look at the graph, look for consistency of data, and look for lack of inflection points(comparative analysis). Make sure you have at least 1 week of CONSISTENT data (that is not 1 week of data). This also gets into why visitor based metric systems are much better for this analysis, and also why you need to think in terms of propensity of action.
2) Make sure you have enough data – The amount needed changes by site. Some sites, 1000 conversions per recipe is not enough, some sites 100 per recipe are. Understand your site and your data flow. I cannot stress enough that data without context is not valuable.
“Information is not knowledge”
3) Make sure you have meaningful differentiation –Make sure you know what your natural variance is for your site (in a visitor based metric system, it is pretty regularly right at 2% after a week). Make sure that the lift is consistent, and that it is more than mechanical noise (something laboratory stats equations DON’T account for). You can be 99% confident at .5% lift, and I will tell you have nothing (neutral). You can have 3% lift and 80% confidence, if it is over a consistent week (this is not daily performance) and I will tell you have a decent win.
I am not saying that there is not a lot of value from P-value based calculations, but I will stress that they are not panacea nor are they an excuse for not doing active work to understand and act on your data.