Parakeet Tuxedos: Validity Revisited

I often hear people asking, “Is this test validated?” It is a reasonable question that everyone using a test for selection needs to be asking. But does this question really tell you what you need to know? On the surface, it sounds reasonable, but it really makes as much sense as asking, “Are your shoes validated?” and getting the reply, “My parakeet wears a tuxedo.” The answer doesn’t make any more sense than the question. There is a whole process outlined for developing validating tests. If you use tests, you can get your own copy of Standards for Educational and Psychological Testing from the American Psychological Association. The ‘standards’ are the definitive source for test professionals and some of its principles can be used to help users separate the good, the bad and the ugly. What theory is the test based on? Good tests are always based on an established theory. That is, there is a research base to guide item and scale development, not just someone’s idea that the world needs another test. Some tests are based on communication theory, others are based on leadership theories and still others are based on motivational theories. If you are using a test for selection, you need to ask the vendor for research showing the test is based on a theory of selection ? not some training theory. What purpose is the test supposed to serve? If you have training responsibilities, you might need to know something about communication effectiveness, style awareness or motivational drivers. But if you are hiring people, you must be cautious not to assume that personality style, leadership type, or communication type equals performance. For one thing, styles may not have anything to do with job performance. For another, style differences tend to produce “out-of-the-box” thinking ? a condition you want to keep, not extinguish. What studies can you produce showing test scores are related to job performance? Here is where incompetent vendors (I do not use this word lightly) excel at selling bad tests to unsuspecting users. Some vendors like to cite studies where they gave a test to a group of clerks or salespeople or managers and calculated their scores. Bogus! Group averages tell you only about group differences, not individual performance. For example, it may be fun to learn that engineers tend to be ISTJ and salespeople ENFP, but being an ISTJ or ENFP does not make you either an engineer or salesperson. Test users must always be able to separate causal relationships (‘a’ causes ‘b’) from correlational relationships (‘a’ happens when ‘b’ happens). If correlation and causation were the same thing, then blonde hair would cause blue eyes. Another example of silly vendor science is taking the average of ‘high producers’ and reporting that as a validity study. Bogus! Validity means that as test scores increase, job performance increases. A good validity study also shows that low scores equal low performance and middling scores equals middling performance. That’s the whole idea behind validity. Averages are not validity evidence. What are the different kinds of validity? Experts often refer to validity in terms of face, content, construct, criteria, concurrent or predictive. Face validity is straightforward. It refers to whether or not the test looks legitimate. A test without good face validity tends to irritate test takers because they see no obvious link with job relevance. But face validity is not all you need to know about. Concurrent and predictive refers to the kind of study used to establish validity. Concurrent means it was developed based on a study of people already on the job (i.e., it is “concurrent”). Predictive means people took the test, waited a few months and the relationship between test scores and performance was measured (i.e., it ‘predicts’ performance). Predictive is the better of the two validity techniques, but it is seldom used because it takes so much time. (Business does not like to wait). This leaves the big ones. Construct validity refers to things like mental ability, motivation or values. They are deep-seated mental ‘constructs’ that theory associates with job performance. The EEOC does not like construct validity because it has so many different types of interpretation and tends to discriminate. This brings us to the two forms of validity that test users really need to worry about. Content validity means the content of the test ‘looks’ a great deal like the job. It is similar to face validity. Content valid tests include the Bar exam, solving business problems, securities exams, professional engineering exams, or technical tests. The theory is that you either know the subject content or you do not. Criterion or criteria-related validity means test scores predict job performance. That is, a person scoring 90 will outperform someone scoring 70. This kind of validity requires using either a concurrent or predictive design that compares jobholders test scores with some kind of scaled performance ‘criteria’. Transportability This is not a measure of whether the test is too heavy to carry from one room to another. It means that a properly done validity study in one job can be used in a similar job without redoing all the work. It also means it is up to you, the test user, to show the two jobs are very much alike. If the vendor has done a study in a job similar to yours, you still need to verify the two jobs are “like peas in a pod.” Are tests legal? One good question deserves another. Do you hire everyone who applies? Do you turn down people based on interview data? Congratulations! You are in the test business! The government has not yet forced organizations to hire unqualified people. You just need to show that your test is validated. Back to the original question Question: Is your test validated?

Answer: Wet birds never fly at night.