May 21, 2012

Rant – Why whichtestwon Makes you a Worse Tester

There is nothing less important then what the winning recipe was of a test.

I want to let that sink in.

Everyone loves to get caught up on which recipe won, because it is what you look at and it is what others want to know, but as a tester, it is the way that you arrive a that you arrive at that answer that determines if you actually provide value or just an answer. Individual outcomes interest people who have something invested in being “right” where consistent meaningful discipline is what matters for people who are invested in improving things consistently. If you only discovered something that is the 2nd best out of 10 different feasible alternatives, you wouldn’t pick the 2nd best, but when you only compare two things, that is most like what you are doing. You haven’t accomplished anything and you are actually losing money. If you didn’t actually measure outcomes of multiple alternatives, or if you didn’t measure against a global site wide metric, or if you did not account for the cost to arrive at that conclusion, then you are fooling yourself into thinking you have accomplished something when all you did was take resources from others to make yourself look good. It may impress others, but it has not provided one bit of value to the organization.

In order to be the best alternative, you need context of the site, the resources, the upkeep and the measure of effectiveness against each other. Even is something is better, without insight into what other alternatives would do it is simply replicating the worst biases that plague the human mind. Figuring out the better of two options is an answer, finding out the value of different feasible alternatives is providing value. Finding out who was right “picking the winner” is great for people’s ego, but making sure you are measuring multiple alternatives and that you are choosing the options that provide the highest return to the largest population for the lowest cost is what makes you successful.

To make it worse, people then look at the results and think that they will get the same result for their site, and in the worst case, they do. Sites like whichtestwon, which focus on letting people find out what won amongst two options sound great, and capture people’s attention. They let you guess and pat yourself on the back when you are right or wrong, but the reality is that they are designed to feel good but not actually provide value. If you wanted a site like that to provide value, then they would require

The problems of a tester are two fold, one in convincing others to test, and second in improving the testing to make sure that you are maximizing return and lowering cost. A good tester needs to be able to balance both, since there is little to gain outside of personal reward in just foolishly running tests. But sites like whichtestwon? It is designed to assist the first; to provide evidence for people that you can get an positive outcome (missing that you also get outcomes from other uses of the same resources) without actually giving any real insight into if you did provide a positive outcome (an outcome, by itself, tells you nothing). It is designed exclusively for people to abuse to push their own agenda. To take a quote directly from their tour:

Site shows stats from various A/B tests – Finally I’ve got evidence to show clients on a load of design decisions!”

That shows everything that is wrong. Testing should be about seeing what the value of different test variants are, not making the case for a specific one that you want. In order to be successful, you have to prove yourself wrong. If it would have worked the first time, then there was no point in the test (and you are wasting resources to run the test) and you have learned nothing. You should not be given “credit” when you are adding additional cost and providing nothing more then validation for others. When you are wrong, when you have tested what you want and tested other alternatives, and you find other alternatives prove to be more efficient, even if what you wanted was better than control, that is the moment you are truly gaining something from your testing efforts.

There is a plague of people in our industry who try everything that can to show how much value they got from a single test. Who view testing as a way to get what they want up on the site over the HiPPo or someone else. Who abuse testing to push their agenda and who then take credit when they find something that proves better then what was there before. The act of running a test is not a measure of success, nor is having an outcome. Added value only comes from finding an outcome that is different then what you would have already done. In order to do that, you must measure multiple feasible alternatives and find an outcome different then what people want. If you aren’t able to do so, then the most fundamental problem you have is you, and how you think about testing. If you are able to, then the individual outcome, what won, is far less important than how you got there and what you chose not to do. The measure of a testing program is how often they are proving people wrong, and about how consistently you can do that with the least amount of resources possible.

Being a good tester means that you always know the relative costs. It means that you know how often something works, not just if it did one time. To be good, you should be able to create meaningful actionable lift on all your tests, not jump up for joy and promote yourself to the world when you managed to find one thing better on 1 out of 5 tests. Don’t settle for taking the easy road and trying to take credit. Add value, be better, learn how to look at things and you will actually create value, today and always. If you go down that road however, then no one cares which variant won, it has no bearing on long term success. Great, you found the thing to push from this campaign, that is just one small step on a long road of continuous action. You wouldn’t reward someone because they managed to turn write their name on a test, so please do not think that whichtestwon somehow does anything to inform you how to be a better tester.

If you really wanted to see a site like whichtestwon matter, then show the variants that didn’t win. Show multiple options for each outcome and show what the best option was? Give us a measure of the cost and give us the internal roadblocks that you had to overcome. Let us know if that outcome was greater or worse then others for that group and what they are doing with the results to get a better more efficient result next time. If you are interested in anything more than self-promotion, post the things that don’t work. Tell us how often something wins, not the one time it did win. Use the site to find examples of where you were wrong and inform yourself that you are not right… ever. The most we can ever hope to be is a little less wrong and working on a way to speed up the process for discovering just how wrong we are.

TL;DR

Rant – Why whichtestwon Makes you a Worse Tester

19 comments

Join the Discussion Cancel reply

Share this:

Related

19 comments

Join the Discussion Cancel reply