March 19, 2012

7 Traits of a Successful Tester

One of the greatest thrills I get is when I work with new analysts, consultants, and companies as they take the evolution and become great testers. It is so easy to talk about everything that is wrong with most groups, people, and agencies in our industry, but the reality is that I rarely find people who are not intelligent and not willing to work hard. The reality is that there are a very specific set of skills that you find from the best people, either inherent or developed, that allow them to make a much larger impact than others. With that in mind, I want to present seven skills that all great testers have developed and work on daily to make themselves better.

Willing to challenge all ideas – There can be no stone left unturned and no idea that is too sacred to challenge if you really want to find the best results. Being willing to give any and all ideas a fair shake, and being willing to seek out ideas that you don’t agree with is a skill that takes people time to really learn to live.

They are not beholden to a single Dogma – So many groups fail because they try to force testing into Analytics, IT, Marketing, SEO, SEM or any other specific skill. The reality is that testing is a new skill, one that takes parts and interacts with all of those existing disciplines, as well as any other current or future ones. Being able to develop their own skill base and being able to talk to others on their own ground. It isn’t about owning testing, but about bringing it to others.

Technical understanding of how things work – This is not the same as someone who can write all sorts of complicated JQUERY code or can architect your entire system, but all testers need to be able to understand how your site works, how different systems interact, and the different options available to accomplish each goal. To do this, they have to be comfortable working with, around, and even replacing developers as needed.

They have ADHD – In a single day, you might be talking to a designer, two C level people, 3 product owners, work with project managers and engineers, analytics, and then finish the day preparing a result for product managers. And the next day will be completely different. In over 7 years of testing, I have not had two days that were alike. You have to love the constant change, the shifting conversations, the coming and going of industry concepts, and you have to be easily able to shift what you are doing on the fly. If you are only comfortable when you are focusing on one thing, or when you can really dive into something, testing is not for you.

Understand Why people believe what they do – If one of the core tenants of testing is to prove yourself and other wrong, you are going to be upsetting a lot of people if you cannot talk to them about what lead to that conclusion and how best to leverage that thinking. One of the greatest skills is getting people to challenge themselves or to question their very core beliefs.

Not easily frustrated – Testing can be a very frustrating job. You are so often simply asking people to test out something, knowing full well that it takes you longer to discuss the option then it is to add it as a recipe. Or how about when you are waiting for code to be deployed? Or when you prove that a redesign is a failure and the group decides to move forward with it? The reality is that you are always fighting an uphill battle, and it is only when you are able to get past all of those frustrations that you are to add real value. So many perfectly competent people I know fail and end up acting as project managers and just yes men, because they are no longer willing to fight the uphill battle and are not able to get past all of the frustration that you will face.

Pragmatist – More than anything else, a great tester is pragmatic and efficient in everything they do. You have to be willing to not just hold a dogma and be able to take any idea and deconstruct it to get more value. You will so rarely be able to just run with a testing program, but being able to find places to challenge ideas, add value, and learn are the places where a tester really earns their salary.

The reality is that it takes at least a year to a year and a half for even the first signs of lights to pop-on for most testers. Even worse these skills are constantly poached by other groups, since they can be both non-threatening and add a lot of value. No matter where you are, starting out or have been testing for years, I challenge you to look at your own actions and skill set and to see what you can do to get better at each and every one of these.

March 12, 2012

Why we do what we do: Believing what you want to believe – Observer-Expectancy Effect

The human mind is a funny thing. We can be aware of all our own faults, and others, and yet when it comes to stopping ourselves from falling down the many holes that we create for ourselves, we find it much easier to see the same mistake by others then in ourselves. In the next bias I want to tackle is Observer-Expectancy Effect, or “when a researcher expects a given result and therefore unconsciously manipulates an experiment or misinterprets data in order to find it.” Like its sibling, congruence bias, Observer-Expectancy Effect impacts what we test, but it even more fully impacts what conclusions we arrive at. It is about the entire phenomenon of hearing what you want to hear, or using data to only justify what you already wanted to do.

This sinister worm pops its head up in all types of places, whether it is in experiment design, using data to justify decisions, sales pitches, or even in just our own view of our impact on the world. How much of just regular marketing is telling you what you want to hear? Yet, we lose focus that we are just as susceptible to those messages when we are trying to prove ourselves right.

What is important is not as much what the problem is, but how best to “fix” it. How do you insure that you are looking at data and getting the functional result that is best, and not just letting your own psyche lead you down its predetermined path. The trick is to think in terms of the null assumption. The challenge is to always assume that you are wrong, and to look at the inverse of all your “Experience”; challenge yourself to think in terms of “what if that wasn’t even there” or “what if we did the exact opposite”? Making sure that you are trying to prove the inverse, that you are wrong and you will suddenly have a much deeper understanding into the real impact of the outcomes that you are championing. When you try to prove you are right, you will find confirmation, just as when you try to prove you are wrong, you will also come to that conclusion. You have to be willing to be “wrong” in order to get a better outcome. Remember that when you are wrong, you get the benefit of the increased results, and you have learned something.

So what does this look like in the real world? Every time you decide that you are going to go down a path, you will intrinsically want to prove to yourself and others that what you are doing is valuable. The most common example of this is in the quest for personalization, where we get so caught up in proving we can target to groups that we forget to measure the real impact of this decision. We forget that the same person can be looked at a thousand different ways, so when we choose to pick one, that we fail to measure it against the other alternatives. The number of groups that have championed targeting to some minute segment, who when you look deeper into the numbers and find that targeting to browser or time of day would have magnitudes of greater impact, is legion.

The simplest way to test this is to make sure that all of your evaluations, correlative, causal, or qualitative, include the null assumption. What happens if I serve the same changed content to everyone? Or what happens if serve targeted content to firefox users instead? Despite the constant banter and my belief that a personalized experience is a good thing, what do I really see from my execution? What about if we target to the groups that don’t show different behavior in our analysis? Keep deconstructing ideas and keep trying to find ways to break all the rules, and you will find them. Even better, those are the moments where you truly learn and where you truly get value that you would not have gotten from just taking straight to the action.

This is not just a problem with analytics; it plays out with any sort of analysis, especially A/B testing. So many groups make the mistake of just testing their hypothesis against another, which they fail to see the bigger picture. Hypothesis testing is designed to be absolutely sure of the validity of a single idea, not to compare other ideas or to reach any conclusion at a meaningful speed. It is the end point of a long disciplined process, not the starting point where so many want to leverage it.

The final common way this plays out is when we mistake a rate of an action with the value of the action. We get so caught up in wanting to believe some linear relation between items, that having a great promotion and getting more people to click on it equals more value, that we fail to measure the end goal. We mistake the concept we are trying to propagate with the end goal, assuming that if we are successful in pushing towards a desired action, that we have accomplished our end goal. Having run on average 30 tests a week with different groups over the last 7 years, I can tell you that from my own experience, the times when this plays out in the real world I can count on 1 hand.

So much analysis loses all value because we are pre-wired to just accept the first thing we find, or to find data to confirm what we want to believe, or that we then send out that data to others to prove our point and ignore the larger world. We are so wired to want to think we are making a difference that we constantly fail to discover if this is true. Be better then what you are wired to believe and force yourself to think in terms of the null assumption. Think in terms of purposely looking at the opposite of what you are trying to prove or what you believe. The worst case is that you have spent a few more moments and confirmed, truly, what you believe. The best case scenario is that you have now changed your world view and gotten a better result, one that is not arrived at simply because you expected to arrive at that point.

March 5, 2012

MVT – Why Full Factorial vs. Partial Factorial Misses the Entire Point

One of my first introductions to the larger world of testing was getting a chance to serve on a panel about Multivariate testing. I remember how divergent the opinions were and how bad the misconceptions were of the entire process. Just about everyone I talked to had these same common preconceived notion of how to use multivariate testing, and even worse almost all those notions were based on their need to propagate their sales pitches. Now as I work with more and more organizations, you see the same bad ideas replicating and groups continue to not understand the true value from multivariate testing. MVT testing is something that holds all these promises, but when done for the wrong reasons, multiplies the worst of testing, instead of facilitating the best of testing. Even worse, groups then confuse the issue, focusing on the method of the test, and not the fundamental mindset that created it. Many groups then get into debates around the “value” of the different multivariate methods out there, which is nothing more than a fools errand since any method is going to fail.

Too many times people get caught up on the “advantages” or “disadvantages” of the various forms of multivariate analysis. There are many advantages of full factorial testing, from fewer rules, better insight into interactions across tested elements, and the ability to test out non uniform concept arrays. There are many advantages to partial factorial testing, speed, forced conformity to better testing rules, more efficient use of resources. What does not matter is which one allows you to throw things at a wall and get an answer. When you are busy trying to answer the wrong question, then you can fail with any tool. It is only when you are trying to succeed that the differences between tools matter.

The fundamental use of multivariate testing for most groups is to combine multiple badly conceived A/B tests, so that they can quickly throw them all together so they can find a combination that increases results. So many groups want to try out this combination of ideas, so they think a MVT campaign is the solution. Fundamentally you can use the test that way, it is a both statistically a valid outcome and will guarantee a result, but at what cost? The challenge is that you will wasting resources, time, and are guaranteed to get a suboptimal outcome from this flawed way of thinking. Any form of multivariate testing that is just used as a massive collaboration of individual tests is always going to be inefficient, since you are replicating and adding the imperfections of those individual tests in a way that magnifies those imperfections. If your goal is simply that individual outcome, and it is for way too many programs and especially agencies, then you will never get any true value from multivariate testing until you change your mindset.

Fundamentally the concept of trying to just find a combination misses a fundamental truth, that you are spending a massive amount of resources, creating all these permutations and offers, without an understanding of the efficiency of each resources.

1) All the ideas come from preconceptions and hypothesis about what does work

2) The addition of all new variants adds cost in the creation and the data acquisition to be meaningful

If we instead focus on multivariate testing as a means to filter our resources instead of simply combine them, then we are able to achieve efficiency. If we try to limit our resources and only apply them where we will get the most return, then we must always via multivariate testing as a tool to learn and be efficient, not one to just throw things out to see what works.

The classic example of a multivariate test is testing a button. Let us say I have a medium orange purchase button currently on my site. I might think that red might be better than orange, and my UX person thinks that buy now will perform better because he saw it on a few other competitor sites. You throw it out by also adding a slightly larger button and you get a predicted best combination of large orange buy now. You slap yourself on the back, and you move forward. The reality is that each of those factors, size, color, copy have a massive amount of feasible alternatives, and all we did was look at a very limited biased set of them.

Let me propose a better way. Look at that same test, but instead of preconceiving the outcome, look for the value of each factor. If we took the same test, and we found out that size matters more than color, despite what you thought going in. If we spend as little resources as possible to achieve that understanding, then we have left the maximum amount of resources available to apply to the winning factor or element. If we have learned that size matters, we can shift our resources away from less influential elements and then apply the resources towards as many different feasible alternatives of the execution of the winning factor. Instead of being limited to testing 3-4 sizes, we can know the value of size and then create as many different alternatives as possible. Not only have we used less resources, but they have been applied towards the most influential part of our experience.

Even better, I now have learned that size matters most, and I have an outcome that is different and greater then I would have before. In fact I have shifted the system so that the absolute worst thing that can happen is that I end up with the same alternative I would have before, but for less time and resources. I have also added a much higher upside so that I can get a better outcome by having an alternative that I would not have previously included come out the winner. I have also tested out more alternatives of the important factor so that I am not limiting my output by the single input of popular opinion. I have leveraged multivariate testing as a way to learn what matters and to focus my future efforts on that. I no longer have to create alternatives for factors that have no influence, and can instead focus resources on testing as many different feasible alternatives I can for the things that do influence behavior.

The less you spend to reach a conclusion, the greater the ROI. The faster you move, the faster you can get to the next value as well, also increasing the outcome of your program. What is more important is to focus on the use of multivariate as a learning tool ONLY, one that was used to tell us where to apply resources. One that frees us up to test out as many resources for feasible alternatives on the most valuable or influential factor, while eliminating the equivalent waste on factors that do not have the same impact. The goal is to get the outcome, getting overly caught up in doing it in one massive step as opposed to smaller easier steps, is fool’s gold.

You CAN leverage multivariate tests in a large number of ways, and let me tell you that there are enough 15×8 tests out there to show that statistically, it is a statistically valid approach. The question is never what can you do, but what SHOULD you do. Just because I can test a massive amount of permutations does not mean that I am being efficient or getting the return on my efforts that I should. We can’t just ignore the context of the output to make you feel better about your results. You will get a result no matter what you do, the trick is constantly getting better results for fewer resources.

If you are stuck in the realm of trying to show results from a single test, or are not thinking in terms of your testing program as a learning optimization machine, then you aren’t going to get results you need no matter what you do. multivariate tests are useful only in the context of your program, if you are stuck thinking in terms of just the outcome of that specific test, you will never achieve the results that you want.

If you shift to think about it in context of a larger program, then multivariate tests are just one of many tools you have at your disposal to achieve those goals. Don’t let the promises and sales pitches of a few divert your attention away what matters. And if you are focusing on what matters, then the nature of which type of multivariate test you use becomes almost completely moot.

February 27, 2012

Bridging the Gap: Dealing with Variance between Data Systems

One of the problems that never seems to be eliminated from the world of data is education and understanding on the nature of comparing data between systems. When faced with the issue, too many companies find the variance between their different data solutions to be a major sign of a problem with their reporting, but in reality variance between systems is expected. One of the hardest lessons that groups can learn is to focus on the value and the usage of information over the exact measure of the data. This plays itself out now more than ever as more and more groups find themselves with a multitude of tools, all offering reporting and other features about their sites and their users. As more and more users are dealing with the reality of multiple reporting solutions, they are discovering that all the tools report different numbers, be it visits, visitors, conversion rates, or just about anything else. There can be a startling realization that there is no single measure of what you are or what you are doing, and for some groups this can strip them of their faith in their data. This variance problem is nothing new, but if not understood correctly, it can lead to some massive internal confusion and distrust of the data.

I had to learn this lesson the hard way. I worked for a large group of websites who used 6 different systems for basic analytics reporting alone. I led a team to dive into the different systems and understand why they reported different things and to figure out which one was ”right.” After losing months of time and almost losing complete faith in our data, we discovered some really important hard won lessons. We learned that the use of the data is paramount, that there is no one view or right answer, that variance is almost completely predictable once you learn the systems, and that we would have been far better served spending that time on how to use the data instead of why they were different.

I want to help your organization avoid the mistakes that we made. The truth is that no matter how deep you go, you will never find all the reasons for the differences. The largest lesson learned was that your organization can be so caught up in the quest for perfect data that they forget about the actual value of that data. To make sure you don’t get caught in this trap, I want to help establish when and if you do have a problem, the most common reasons for variance between systems, and some suggestions about how to think about and how to use the new data challenge that multiple reporting systems presents.

Do you have a problem?

First, we must set some guidelines around when you have a variance problem and when you do not. When you have systems designed for different purposes, they will leverage that data in very different ways. No systems will match, and in a lot of cases, being too close represents artificial constraints on the data that is actually hindering its usability. At the same time, if you are too far apart, then that is a sign that there might be a reporting issue with one or both of the solutions.

Here are two simple questions to evaluate if you do have a variance “problem”:

1) What is the variance percentage?

Normal variance between similar data systems is almost always between 15-20%.
For non-similar data systems the range is much larger, and is usually between 35-50%.

If the gap is too low or too large, then you may have a problem. A 2% variance is actually a worse sign then a 28% variance on similar data systems.

Many groups run into the issue of trying too hard to constrain variance. The result is that they put artificial constraints on their data, causing the representative nature of the data to be severely hampered. Just because you believe that variance should be lower does not mean that it really should be or that lower is always a good thing.

This analysis should be done on non-targeted groups of the same population (e.g., all users to a unique page.) The variance for defendant tracking (segments) is going to always be higher.

2) Is the variance consistent in a small range?

You may see variance be in a series of 13, 17, 20, 14, 16, 21, 12 over a few days, but you should not see 5, 40, 22, 3, 78, 12.

If you are within the normal range and you are in the normal range of outcomes, then congratulations, you are dealing with perfectly normal behavior and I could not more strongly suggest that you spend your time and energy on how best to use the different data.

Data is only as valuable as how you use it, and while we love the idea of one perfect measure of the online world, we have to remember that each system is designed for a purpose, and that making one universal system comes with the cost of losing specialized function and value.

Always keep in mind these two questions when it comes to your data:

1) Do I feel confident that my data accurately reflects my users’ digital behavior?

2) Do I feel that things are tracked in a consistent and actionable fashion?

If you can’t answer those questions with a yes, then variance is not your issue. Variance is the measure of the differences between systems. If you are not confident in a single system, then there is no point in comparing it. Equally, if you are comfortable with both systems, then the differences between them should mean very little.

The most important thing I can suggest is that you pick a single data system as a system of record for each action you do. Every system is designed for different purposes, and with that purpose in mind, each one has advantages and disadvantages. You can definitely look at each system for similar items, but when it comes time to act or report, you need to be consistent and have all concerned parties aligned on which system is the one that everyone looks at. Choosing how and why you are going to act before you get to that part of the process is the easiest fastest way to insure the reduction of organizational barriers. Getting this agreement is far more important for going forward than the dive into the causes behind normal variance.

Why do systems always have variance?

For those of you who are still not completely sold or who need to at least have some quick answers for senior management, I want to make sure you are prepared.
Here are the most common reasons for variance between systems:

1) The rules of the system – Visit based systems track things very differently than visitor based systems. They are meant for very different purposes. In most cases, a visit based system is used for incremental daily counting, while a visitor based system is designed to measure action over time.

2) Cookies – Each system has different rules about tracking and storing of cookie information over time. This tracking will dramatically impact what is or not tracked. This is even more true for 1st versus 3rd party cookie solutions.

3) Rules of inclusion vs. Rules of exclusion – For the most part, all analytics solutions are rules of exclusion, meaning that you really have to do something (IP filter, data scrubbing, etc.) to not be tracked. A lot of other systems, especially testing, are rules of inclusion, meaning you have to meet very specific criteria to be tracked. This will dramatically impact the populations, and also any tracked metrics from those populations.

4) Definitions – What something means can be very specific to a system. Be it a conversion, a segment, a referrer, or even a site action. The very definition can be different. An example of this would be a paid keyword segment. If I land on the site, and then see a second page, what is the referrer for that page? Is it the visit or the referring page? Is it something I did on an earlier visit?

5) Mechanical Variance – There are mechanical differences in how systems track things. Are you tracking the click of a button with an onclick? Or is landing on the previous page? Or is it he server request? Do you use a log file system or a beacon system? Is that a unique request or added on to the next page tag? Do you rely on cookies or are all actions independent? What are the different timing mechanisms for each system? Do they collide with each other or other site functions?

Every system does things differently, and as such these smaller changes can build up over time, especially when combined with some of the other reasons listed above. There are hundreds of reasons beyond those listed, and the reality is that each situation is unique and each one is the culmination of the impact of these hundred different reasons. There is no way to ever get to the point where you can accurately describe with 100% certainty why you get the variance.

Variance is not a new issue, but it is one that can be the death of programs if not dealt with in a proactive manner. Armed with this information, I would strongly suggest that you hold conversations with your data stakeholders before you run into the questions that inevitably come. Establishing what is normal, how you act, and a few reasons why you are dealing with the issue should help cut all of these problems off at the pass.

Share this:

Share this:

Share this:

Share this: