Still Crazy After All These Years: The Validation Problem

OK. Suppose you listened to my ?Chicken Little? speech and decided to do a validation study. Where do you start? What do you use for criteria? Supervisor ratings? The Fed discount rate? Shoe size? How about people in the study? Only the high performers? Everybody in the company? The boss? favorites? In this column, we?ll discuss some validation basics. As usual, we?ll try to ?put ten pounds of information into a half-pound bag.? By the way, although U.S. law requires validation, low productivity and high turnover is international. Three kinds of validation

To ?validate? is to offer formal proof that your system works. This is a good thing. Offering proof begins with analyzing the job to see what is needed. But, and I cannot emphasize this next statement enough, job analysis is NOT a training needs analysis, it is NOT a job description (job scope and responsibility) and it is NOT a job evaluation (job pay). A job analysis identifies the skills and competencies required for job success. There are 16-week college courses on basic job analysis techniques. It is a subtle skill that takes years to master. Phew! Now that we have done a job analysis, you must choose from three possible validation studies. They are content validation, criterion validation, and construct validation. The differences can be explained using a simple data entry clerk position. (Don?t get overly optimistic, data entry is a simple example. Real life is much more complicated). Suppose your job analysis identified typing skills as a critical part of job success. A test that required typing would be ?content valid.? On the other hand, if you want to predict overall job performance, you would have to show that typing scores related to job performance. This is an example of ?criterion? validity. Finally, if you discovered that attitude had something to do with keyboard skills, you could test for the psychological construct of ?attitude.? Construct validity is often very hard to identify or interpret. The Feds advise against using construct ability. By the way, you should not use someone else?s validity results unless your job analysis (there?s that word again) shows both jobs are virtually the same. By the way, stay away from the training department. Training tests are usually designed to deliver simple messages that can be communicated during short workshops. They are seldom designed to predict job performance. The temptation may be there, but forget about it! Use a test that was specifically designed to measure the content or criterion of the job. Who participates

Now that you have chosen your validity, you need to pick some kind of test, find some kind of rating criteria and decide whether you will do a ?concurrent? or ?predictive? study. A concurrent study means that you will use current jobholders and current job ratings. Predictive means you will give the test to all new employees and wait from 9 months to a year to gather performance ratings. A predictive study is better (you get a wider range of scores) but most people don?t have the patience to wait. We?ll give the test to current jobholders and the rating sheet to managers. At this point we want to let you know how to spot the less-than-competent vendors or ?lessies.? A “lessie” argument goes like this, ?Let?s test only the high performers. That way we?ll be able to match applicants? against a ?high-performance average.? Sound good? Actually, no. Sampling only high performers is a wrong-headed way to determine validity because (1) it tells you nothing about low performers, (2) ?high-performance? is usually very subjective (more on this later), and (3) it is a way to sell tests, not establish validity. If that doesn?t convince you, think about this: it?s unlikely that even your own high performers will match your group average! Remember, your objective (and the Feds’ objective, as well) is to determine whether your test predicts both high and low productivity ? not whether an applicant matches an average profile. If a ?lessie? suggests profiling, it is probably because they either don?t know good validation protocol, don?t care, or their test does not predict performance. This is a bad thing. A good validation study includes a range of performers from high to low, all colors, races, ages and genders. That is, you need the widest possible range of performance and people in order to investigate both adverse impact and trustworthiness of your test. With one exception — leave out the ?newbies? who have not had time to learn the job. What to rate

By now you may think you are half-way done. Think again. This is the time to decide what to rate. Your first challenge is to eliminate test ?noise? or method error. For example, if you use overall productivity, you might also be measuring market factors, economic influence, raw material quality, production variability, etc. ? all items that will tend to affect your results one way or another. Performance appraisals have legendary inaccuracy. If you use management ratings, you should remember that managers tend to bring personal biases to the table. In some organizations everyone gets high ratings, low ratings, mid ratings, or (GASP!) even rate people according to how much they like them. If you use customer service ratings, the customer may blame the person for things outside his or her control like prices, quality or shipping. If you use peer ratings, you might as well hold a popularity contest. Finding trustworthy performance criteria is an art. Otherwise you may compare good test results to garbage ratings. In any event, it is always a good idea to moderate a workshop where people discuss each item in behavioral terms before giving ratings. Doing the math

Now that you have a broad range of people and trustworthy performance ratings, it is time to do the math. First, you have to be sure your numbers are large enough. Less than 25 people is very chancy, 50 is grudgingly OK, 100 is better, and several hundred is good. As a general rule of thumb, you should have about 10 subjects for every variable you want to test. That is to say, you should rate no more than 10 variables for every 100 people. Why is all this important? Low numbers of subjects and/or a high number of variables can screw up your results. You may find ?relationships? that are not real or you may overlook subtle relationships that are hidden among the ?noise.? Remember you are working with a group of people who have a lot in common (i.e., they haven?t been ?canned? for non-performance). The individual differences will be subtle and you must examine them look carefully. If you think it?s been fun so far, think about this: correlating test numbers with job performance is seldom linear. More is not always better. Sometimes test scores fall along a curve where more is less or where more is ?so-so.? You have to be prepared to use scatter graphs, linear and non-linear statistical techniques to examine your data from all angles. Reporting the results<<br> OK, assuming you have not gone ?overload,? your final task is to write up your results. This includes documenting job analysis data, business need, job necessity, people selected, their backgrounds and some work history, the test you chose, how it was administered, the performance criteria, how and why it was chosen, the workshop administration data, the statistical methods used, your findings, the impact on any protected groups, and your contact data. Conclusion

By now, I hope I have convinced you that you need professional help. Like I said. This is just an overview. It takes time and practice to get good, so don?t complain if your consultant prices the job at 25K to 50K. This is significantly less than either attorney fees or productivity gains. If you would like a more information, you can go to the DOL Website, call the American Psychological Association and purchase a copy of The Standards for Educational and Psychological Testing, send me an email asking for a copy of the DOL?s booklet entitled: Testing and Assessment: An Employers Guide to Good Practices, or ask me to send you my Highly Abridged Version of the Uniform Guidelines On Employee Selection Procedures (1978).